freva people

Introduction to Freva freva logo

Martin Bergemann, Bijan Fallah, Andrej Fast, Mostafa Hadizadeh, Christopher Kadow, Etor E. Lucio-Eceiza, Felix Oertel, Manuel Reis, and many others…

Deutsches Klimarechenzentrum @ CLINT in Data Analysis Dpt. (+ Data Management Dpt.*)

freva people

Common Problem I: Finding Data¶

data

Common Problem II: Using code of others¶

code

Common Problem III: Reproducing your results¶

reproduce
  • How can we search and access various datasets efficiently?
  • How can we streamline user data analysis tools (reusable and reproducible)?

Yet another solution: The Freva framework freva logo

  • Flexibility
  • Standardisation
  • Centralisation
  • Transparency

Let's get an overview...¶

freva people

Flexible access freva access

Freva is a (mainly) Python3 framework

Running at DKRZ's HPC, it comes in 3 flavours

  • Command Line Interface (cli)
  • Web User Interface
  • Python module

Each interface offers similar and interconnected features

parts

Standardized data freva access

  • CMOR, maps FROM/TO several ESGF standards (CMIP6, CORDEX, pseudoCMIP5, nextgems flavours
  • Data ingested via SOLR Apache
  • Millions of files available (>10 million)
  • Intuitive queries & Fast results
  • Metadata previsualisation
  • Time selection
  • Generates reproducible freva command & URL
  • Indexation of POSIX files, tape archive and intake catalogs & multiple formats (netcdf, zarr…)
parts parts parts

Easy incorporation of tools freva access

  • Flexible programming language: NO specific language (only free software!)
  • Standardised API in python 3: no need to know all the code environments
  1. Tool: ANY language (python, R, C, FORTRAN, a mix...)
./movie_plotter.sh /path/2/INPUT /path/2/OUTPUT
  1. Plugin API (Wrapper): in python
from evaluation_system.api import plugin
from evaluation_system.api.parameters import (ParameterDictionary as ParamDict, String)

class MoviePlotter(plugin.PluginAbstract):
    __short_description__ = "Plots 2D lon/lat movies in GIF format"
    __version__ = (0,0,1)
    __parameters__ = ParamDict(
      String(name='input', default=None, mandatory=True, help='File to plot'), 
      String(name='outputdir', default=None, mandatory=True, help='default output dir')
    )
    def run_tool(self, config_dict=None):
        input  = config_dict['input']
        outputdir = config_dict['outputdir']
        self.call(f'{self.class_basedir}/movie_plotter.sh {input} {outputdir}')
        return self.prepare_output(config_dict['outputdir'])
  1. Freva command:
freva plugin MoviePlotter input=/foo/bar.nc outputdir=/path/2/OUTPUT
parts

Transparency & reproducibility freva access

  • Every config stored in a (MariaDB) database
  • Every config is searchable
  • Every config can be modified & re-run
  • History stores plugin & Freva system Git versioning!
  • Saves CPU hours, I/O and storage!

Additionally:

  • Results can be shared and commented
  • Config compared with previous similar runs
parts

... And there is more¶

freva people

A development environment freva access

One can test a plugin without interference:

  • It exists? locally overwrite
  • Is new? locally plug in

Similar behaviour with the cli:

  • To plug: export EVALUATION_SYSTEM_PLUGINS environment variable
  • multiple local plugs allowed
parts parts parts

Plugin → Database → Plugin freva access

1. Special Freva function in the plugin wrapper to add outputs in databrowser:
    
if config_dict["link2database"] is True:
    self.add_output_to_databrowser(outputdir, project=config_dict["project"], product=config_dict["product"],)
    
freva access
2. Result now is part of the users database:
freva access
3. Ready as input for new plugin:
freva access
Freva Docu

How to use Freva? freva access

Freva framework:
  • Extensive documentation, updated regularly
  • Accessible from every Freva web interface
parts
Freva Docu
Freva plugins:
  • Plugins have short descriptions
  • Setups have detailed configuration info
  • Many plugins have documentation pages
parts
Freva Usage

What's next? freva access

  • Freva RestAPI: allows connection with solr to search for data in many languages (working!)
  • Freva Client: freva library for data search, for python and cli (working!)
  • Data streaming: allows to stream data in zarr from anywhere (filesystem, cloud, tape archive) (WIP)
  • Freva Futures: registering a dataset that will exist in the future (WIP)
  • Freva workflows: an efficient way to connect plugins (e.g. via CWL, concept)
  • ...
parts
freva people
Thanks for your attention!

Contact: freva@dkrz.de

Documentation: https://freva-clint.github.io/freva/

Workshop GitLab Repository: https://gitlab.dkrz.de/freva/freva_workshop